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SUMMARY 


F-4  Pilots  and  Weapons  Systems  Officers  (WSOs)  from  nine  mission 
ready  squadrons  across  the  United  States  provided  rating  and  error 
evaluation  data  on  their  performance  of  the  pop-up  weapon- deli  very 
maneuver.  The  maneuver  was  subdivided  into  eight  components. 

Aircrews  assessed  their  performance  on  each  component  of  545  scored 
maneuvers.  They  also  listed  causes  of  less-than-optimal  performance 
on  particular  components  of  each  delivery.  This  information  was 
compiled  and  analyzed  to  determine  the  relative  contribution  of  each 
of  the  components  to  the  accuracy  of  weapon  delivery. 

The  major  result  of  these  analyses  was  that  rated  performance  on 
the  final  few  seconds  of  the  maneuver,  during  which  the  crew  is  trying 
to  execute  a  constant  angle,  high-speed  dive,  is  clearly  the  best 
predictor  of  weapon-delivery  accuracy.  Several  aspects  of  the  data 
show  this  finding  is  robust  and  not  artifactual.  The  same  result  was 
obtained  for  both  pilots  and  WSOs  at  each  wing  and  for  both  ratings 
and  reported  error  frequencies. 

The  outcome  of  this  study  indicates  the  analysis  of 
self-assessment  data  can,  under  some  conditions,  yield  important 
information  about  the  components  of  a  skill.  The  use  of 
self-assessment  data,  however,  is  not  advocated  in  situations  where 
reasonably  economical  and  unobtrusive  direct  measures  of  performance 
are  available. 
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339th  Tactical  Fighter  Squadrons  of  the  374th  Tactical  Fighter  Wing,  Moody 
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tactical  fighter  pilot  performance  on  which  this  study  was  based. 
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POP-UP  WEAPON-DELIVERY  MANEUVER:  USE  OF  PILOT  DATA 
IN  ANALYSIS  OF  CRITICAL  COMPONENTS 


INTRODUCTION 

This  report  is  based  on  data  collected  during  a  large  scale  test 
of  methodology  developed  for  the  Air  Force  Skill  Maintenance  and 
Reacquisition  Training  Program  (Project  SMART),  which  has  the  goal  of 
defining  and  measuring  the  basic  skills  which  support  aircrew  mission 
readiness.  The  test  was  requested  by  Tactical  Air  Command 
Headquarters  during  a  preliminary  evaluation  of  Project  SMART.  The 
purposes  of  the  report  are  twofold:  (a)  present  the  results  of  a 
self-assessment  methodology  explored  by  Project  SMART  to  analyze  skill 
in  pop-up  weapon  delivery  and  (b)  document  some  of  the  strengths  and 
weaknesses  found  with  this  approach.  The  present  report  is  concerned 
with  the  results  and  effectiveness  of  the  methodology  and  a  detailed 
description  of  preliminary  development  efforts  presented  in  Pierce, 
DeMaio,  Eddowes,  and  Yates  (1979). 

In  order  to  adequately  assess  mission  readiness.  Project  SMART  has 
focused  its  data  collection  efforts  on  continuation  training.  Initial 
efforts  consisted  of  interviewing  pilots  at  operational  squadrons  to 
determine  (a)  the  tasks  most  critical  to  their  mission  and  (b)  the 
important  parameters  underlying  performance  of  these  tasks.  On  the 
basis  of  the  interviews,  the  pop-up  weapon  delivery  (Pierce  et  al., 
1979)  and  low  altitude  tactical  formation  (DeMaio,  Eddowes,  1980) 
tasks  were  selected  for  further  study.  Aircrews  were  then  asked  to 
make  detailed  assessments  of  their  own  performance  on  these  tasks. 

This  report  concerns  the  analysis  of  pop-up  weapon  delivery  skill 
based  on  these  data. 

Collecting  subjective  data  from  experts  might  be  expected  to  have 
some  inherent  advantages.  It  should  have  no  impact  on  operational 
equipment  and  does  not  require  long  data  collection  sessions  that 
might  interfere  with  scheduling.  Because  of  this,  it  is  sometimes 
considered  an  acceptable  last  resort  in  situations  where  actual 
performance  data  cannot  be  collected.  Unfortunately,  aircrew 
self-assessment  may  also  be  viewed  as  a  potentially  misleading  source 
of  data.  Nisbett  and  Wilson  (1977)  reviewed  a  number  of  studies  that 
demonstrate  the  difficulty  of  making  correct  inferences  from 
subjective  reports,  even  when  such  reports  do  not  necessarily  imply 
self-evaluations. 

Can  pilot  self-assessment  be  a  useful  source  of  data  for 
understanding  the  components  of  mission  readiness?  Before  the 
specific  case  examined  in  this  report  is  discussed,  the  general  issues 
involved  will  be  reviewed  a  little  more  closely. 
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USE  OF  PILOT  SELF-ASSESSMENT 


Consider  the  following  common  situation.  A  pilot  performs  some 
maneuver  that  can  be  objectively  scored  (for  example,  navigation  to  a 
point,  or  weapon  delivery).  The  pilot  then  is  informed  of  the  score 
received.  After  returning  from  the  sortie,  the  pilot  may  be  asked  for 
an  evaluation  of  how  well  the  maneuver  was  performed.  There  are 
several  ways  in  which  the  pilot  may  arrive  at  this  evaluation.  At 
best,  the  pilot  might  try  to  recall  the  particular  characteristics  of 
the  performance  and  compare  them  to  some  performance  ideal.  However, 
the  pilot  might  instead  give  the  answer  implied  by  the  score  without 
considering  other  aspects  of  the  performance.  Or  worse,  the  pilot 
could  simply  give  the  evaluation  believed  to  be  the  most  pragmatic, 
without  referring  to  the  actual  performance  at  all. 

One  way  to  distinguish  between  these  possibilities  is  to  ask  for 
evaluations  of  the  components  of  the  performance  instead  of  just  an 
overall  evaluation.  If  the  pilot  is  referring  only  to  information 
about  the  desirability  of  the  evaluation  or  the  known  success  of  the 
maneuver,  then  the  evaluation  of  all  the  components  should  covary 
strongly,  or  receive  roughly  the  same  evaluation.  However,  to  the 
extent  that  the  evaluation  of  different  task  components  shows 
different  characteristics,  confidence  increases  that  some 
component-specific  knowledge  is  being  tapped;  that  the  pilot  is  making 
an  effort  to  accurately  assess  the  performance. 

Unfortunately,  even  if  different  components  of  performance  receive 
different  ratings,  this  does  not  prove  that  the  pilot's  judgments  are 
based  on  accurate  memory  for  performance.  Component  ratings  could 
also  be  based  on  reconstructions  of  pi ausible  performance.  For 
example,  if  it  is  part  of  general  pilot  Tore  or  accumulated  personal 
experience  that  a  particular  part  of  a  maneuver  is  the  most  difficult, 
self-evaluation  of  this  part  might  be  adjusted  accordingly. 

Fortunately,  there  are  some  non-experimental  techniques  for 
distinguishing  between  memory  for  actual  performance  and  plausible 
reconstruction.  One  way  is  to  ask  the  pilot  to  recall  aircraft 
parameters  (airspeed,  altitude,  etc.)  which  characterized  each 
component.  Ideally,  these  could  be  compared  to  the  parameters  that 
the  pilot  was  trying  to  achieve.  If  consistent  differences  exist, 
then  evidence  is  gained  supporting  accurate  memory  for  performance. 

Another  approach,  involving  still  more  detail,  is  to  require  the 
pilot  to  recall  the  specific  errors  made.  The  lack  of  reported  errors 
will  not  be  very  revealing,  but  their  presence  provides  data  to 
compare  with  low  self-evaluations.  As  a  general  principle,  one  should 
require  as  much  detail  in  self-assessment  as  the  constraints  of  the 
situation  will  permit. 

In  the  study  described  here,  both  component  ratings  and  reported 
error  data  were  collected  on  performance  of  a  short  duration, 
relatively  difficult  maneuver  (pop-up  weapon  delivery). 


TASK  DESCRIPTION 


The  pop-up  weapon  delivery  was  identified  by  pilots  very  early  in  M 

the  Project  SMART  effort  as  a  critical  air-to-ground  maneuver  in  the  1, 

tactical  fighter  pilot's  repertoire.  The  maneuver  is  designed  to  be 
used  after  the  pilot  has  flown  a  low  altitude  i  oute  to  a  target.  l 

After  reaching  a  preplanned  pull-up  point,  the  pilot  climbs  quickly  1 

and  rolls  over  to  an  apex  altitude  and  then  dives  from  a  prespecified 
position  (called  the  track  point)  at  a  constant  dive  angle  toward  the 
release  point,  from  wh T ch  a  weapon  is  released.  For  purposes  of  4 

analysis,  this  maneuver  was  broken  down  into  meaningful  discrete  j 

segments  on  the  basis  of  interviews  with  pilots  (Pierce  et  al.,  , 

1979).  Figure  1,  taken  from  Pierce  et  al.,  shows  the  resulting 
segmentation  of  the  maneuver. 

In  the  present  study,  mission-ready  F-4  pilots  were  asked  to  make  \ 

detailed  assessments  of  the  crew's  performance  on  the  aforementioned  ■] 

components  of  the  pop-up  delivery.  An  examination  of  the  j 

relationships  between  component  assessments  and  accuracy  was  expected  j 

to  isolate  the  critical  components  on  the  maneuver.  \ 

\\ 

METHOD  j 


Participants 

Pilots  and  WSOs  from  nine  operational  F-4  squadrons  participated 
in  the  study.  Squadrons  participating  were  those  of  the  474th 
Tactical  Fighter  Wing  (TFW)  at  Nellis  AFB,  the  347th  TFW  of  Moody 
AF8,  and  the  4th  TFW  at  Seymour  Johnson  AFB. 

Procedure 

Pilots  and  WSOs  were  asked  to  make  detailed  assessments  of  the 
crew's  performance  on  each  of  the  aforementioned  segments  of  the 
pop-up  each  time  the  maneuver  was  performed.  These  assessments  took 
the  form  of  ratings  on  a  four-point  scale  (excellent,  satisfactory, 
marginal,  and  unsatisfactory)  of  performance  on  each  segment  and 
written  comments  consisting  largely  of  explanations  of  errors  made  on 
each  segment.  In  addition,  bomb  scores  (miss  distances)  were  recorded 
for  each  controlled  range  delivery,  and  the  delivery  outcome  (bull, 
hit,  miss,  dry,  or  abort)  was  recorded  for  tactical  range  deliveries. 

These  assessments  were  gathered  on  the  form  shown  in  Figure  2 
according  to  the  detailed  instructions  (Figure  3).  During  the 
debriefing  session  of  daily  sorties,  approximately  80  percent  of  these 
forms  were  collected  by  members  of  the  Project  SMART  staff,  who  were 
present  to  answer  questions  about  the  procedure.  The  remainder  of  the 
forms  were  filled  out  in  the  absence  of  a  project  researcher  by  crews 
who  had  nevertheless  been  formally  briefed  on  the  assessment  procedure. 


''I  DELIVERY  PROFILE  (From  Pierce  et  al.,  1979). 


POP-UP  EVALUATION  FORM 

PILOT  #: _  SQUADRON  #•  _  RANGE  #: 

EVENT: _  BLOCK : _  DATE: 


PASS  t 

lit  2nd  3rd  4th 
BOMB  SCORES  _  _  _  _ 

Tail:  Evaluation  COMMENTS/INDICATE  PASS  I 

1.  Approach  to  PUP  _  _  _  _ _ 


2.  PUP 


3.  Climb  Leg 


4.  Target  Acquisition 


S.  Pull  Oown  Point 


6.  Apex 


7.  Track  Point 


8.  Bomb  Run 


9.  Recovery 


10.  Rtn  to  low  Alt 


11.  Exposure  Time 


Legend:  E  -  Excellent  S  -  Satisfactory  M  -  Marginal  U  -  Unsatisfactory 


FIG.  2.  SELF-ASSESSMENT  FORM  FOR  POP-UP  WEAPON  DELIVERY. 
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INSTRUCTIONS 

1.  COMPLETE  THE  PILOT  IDENTIFICATION  PORTION  OF  THE  FORM 

2.  RECORD  BOMB  SCORES. 

3.  GRADE  THE  TASK  EVALUATION  SECTION  AS  FOLLOWS: 

E  -  Excellent.  Task  performance  met  criteria  with  no  error 

reflecting  an -unusually  high  degree  of  ability;  no  compensations 
were  required. 

S  -  Satisfactory.  Task  performance  met  criteria  with  minimal  error; 
minimal  compensat Ions  were  required. 

M  -  Marginal.  Task  performance  met  criteria  with  error;  compen¬ 
sations  were  required  to  salvage  the  pass  delivery. 

U  -  Unsatisfactory.  Task  performance  did  not  meet  criteria;  gross 
errors  in  performance  led  to  either  an  unsafe  or  aborted  pass. 

Task  evaluations  are  to  be  based  on  1)  proficiency  to  maneuver  the 
aircraft,  2)  situation  awareness,  3)  agressiveness  and  4)  survivability. 

Any  item  graded  as  either  M  or  U  requires  an  appropriate  explanation  under  the 
Comments  section. 

The  following  Indicates  those  requirements  Identified  with  each  Item 
Included  In  the  Task  Evaluation  section: 

1.  Approach  to  PUP:  (a)  Acquisition  of  PUP;  (b)  Altitude  control; 

(c)  Airspeed  control;  and  (d)  Heading  control. 

2.  PUP:  (a)  Heading  correction;  (b)  M6"  application;  (c)  Airspeed 
correction;  and  (d)  t imlng/dl stance  error. 

3.  Climb  Leg:  (a)  Climb  angle  corrections;  and  (b)  airspeed  corrections. 

4.  Target  Acquisition:  Self-explanatory 

5.  Pull  Down  Point:  (a)  Roll;  (b)  Airspeed  corrections;  (c)  *6" 
application;  and  (d)  Altitude/posltlon  control. 

6.  Apex:  (a)  Pat  tern /posit  ion  correction;  and  (b)  Airspeed  corrections. 

7.  Track  Point:  (a)  Aim  off  point;  (b)  Roll  out;  fc)  Initial  wind 
correction;  (d)  Angle,  azimuth,  and  position  check;  and  (e)  Initial 
pipper  placement. 

8.  8omb  Run:  (a)  Aiming  error  corrections;  (b)  Airspeed  control; 

(c)  Tracking  time  control;  and  (d)  Altitude,  azimuth,  and  dive  angle 
corrections. 

9.  Recovery:  (a)  "G"  application;  (b)  Jinking;  and  (c)  Altitude  and  timing 
control 

10.  Return  to  Low  Altitude:  Transition  to  low  altitude. 

11.  Exposure  Time:  Minimization  of  total  time  spent  out  of  low  altitude 
environment. 


.  3.  INSTRUCTIONS  ACCOMPANYING  SELF-ASSESSMENT  FORM. 
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A  very  similar  self-assessment  procedure  has  been  shown  to  produce 
segment  ratings  which,  when  combined  into  an  overall  score  for  each 
delivery,  are  related  to  bombing  accuracy  of  A-7  pilots  (Pierce  et 
al.,  1979).  However,  due  to  a  Tactical  Air  Command  requirement  that 
this  research  minimize  interference  with  the  normal  training  routine, 
it  was  not  possible  in  either  the  Pierce  et  al.  (1979)  study  or  the 
current  one  to  withhold  from  the  crewmembers  knowledge  of  their  actual 
bomb  scores.  Thus,  the  simplest  hypothesis  to  explain  any  overall 
tendency  for  higher  ratings  and  low  bomb  scores  to  be  related  is  that 
the  ratings  do  not  reflect  actual  performance  on  individual  segments, 
but  rather  are  based  on  retrospective  inferences  from  the  success  or 
failure  of  the  delivery.  This  explanation  predicts  that  there  should 
be  no  systematic  differences  in  the  characteristics  of  the  ratings  for 
different  segments.  However,  the  present  report  contains  analyses  of 
the  segment  ratings  which  show  very  sharp  differences  between 
segments.  Such  differences  reflect  the  use  of  segment-specific 
information  in  producing  segment  ratings. 

Analysis 

To  look  directly  at  the  extent  to  which  rated  performance  on  each 
pop-up  segment  was  related  to  bombing  accuracy,  the  following  analysis 
was  performed.  First,  bomb  scores  for  the  controlled  range  deliveries 
were  categorized  as  bulls,  hits,  or  misses  according  to  the  criteria 
used  for  this  categorization  by  the  squadrons  themselves.  These 
criteria  (given  in  Appendix  A)  differ  for  different  events  and  are  the 
same  as  the  criteria  by  which  the  tactical  range  scores  are 
categorized.  Such  categorization  allows  scores  for  tactical  and 
control  ranges  and  scores  from  different  events  to  be  analyzed 
together,  as  well  as  allowing  separate  analyses  to  be  directly 
compared.  Next,  categorized  bomb  scores  and  ratings  for  each  of  the 
first  eight  segments  of  the  pop-up  (approach  to  pull-up  point  through 
bomb  run)  were  cross-classified.  In  order  to  obtain  sufficient  cell 
frequencies  for  a  chi-squared  test,  bomb  score  categories  were 
collapsed  so  that  bull  and  hits  were  in  one  category  and  the  misses, 
dry  passes,  and  aborted  runs  were  in  another.  Also,  ratings  were 
collapsed  into  two  categories  with  ratings  which  do  not  reflect 
significant  error  (excellent  and  satisfactory)  in  one  category  and 
marginal  and  unsatisfactory  ratings  in  the  other.  Chi-squared  values 
and  values  of  the  chi-squared  contingency  coefficient  (o)  were 
computed  on  the  resulting  2  x  2  matrices  for  each  of  the  first  eight 
segments  of  the  pop-up  (the  segments  that  precede  the  bomb  delivery). 

Another  source  of  data  on  the  question  of  the  criticality  of 
pop-up  segments  is  the  comments  which  supplemented  the  segment  ratings 
on  about  one-third  of  the  deliveries.  Many  of  these  comments 
mentioned  specific  errors  made  on  a  particular  segment,  and  there  were 
a  sufficient  number  of  errors  mentioned  to  allow  computation  of  the 
proportion  of  deliveries  which  resulted  in  misses  or  dry  or  aborted 
passes,  given  that  a  particular  kind  of  error  was  reported.  This 
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proportion  indicates  the  severity  of  the  error.  It  can  be  compared 
with  errors  introduced  on  different  parts  of  the  maneuver.  Errors 
which  were  clear  consequences  of  errors  made  earlier  in  a  delivery 
were  not  included  in  this  analysis. 

A  number  of  other  analyses,  conducted  to  examine  more  specific 
issues,  are  described  in  the  results  section.  Because  of  the  number 
of  statistical  tests  conducted,  the  criteria  for  significance  used  are 
p  .0025  for  each  of  four  tests  of  entire  distributions  and  p  .001 
for  each  of  37  tests  of  specific  segment/bomb  score  relationships. 
These  values  yield  an  overall  probability  (of  accepting  a  chance 
result  as  significant)  of  less  than  0.05. 

RESULTS  AND  DISCUSSION 

The  pilots'  ratings  of  segment  performance  are  considered  first. 
Figure  4  shows  the  obtained  values  for  the  strength  of  association 
between  segment  ratings  and  bombing  accuracy  for  each  segment  of  the 
pop-up,  based  on  545  scorable  deliveries  from  nine  operational  F-4 
squadrons.  The  overall  pattern  shows  that  performance  ratings  on  the 
final  two  segments  before  ordnance  delivery,  i.e.,  track  point  and 
bomb  run,  have  by  far  the  strongest  relationship  to  bomb  accuracy. 
Ratings  by  the  WSOs  for  the  same  deliveries  show  substantially  the 
same  pattern,  though  the  association  of  ratings  with  accuracy  is 
somewhat  lower.  These  data  are  shown  in  Figure  5.  WSOs  do  not  get  as 
clear  a  view  of  the  maneuver  as  does  the  pilot;  nor  are  they  directly 
involved  in  controlling  the  aircraft.  Therefore,  the  more  detailed 
analyses  reported  below  were  conducted  on  the  pilot  data  only. 
Nevertheless,  the  WSO  data  confirm  that  track  point  and  bomb  run 
ratings  are  the  best  predictors  of  accuracy. 

Much  the  same  pattern  of  results  holds  for  maneuvers  with  both 
high  (30°)  and  low  (10°  to  15°)  delivery  angles.  Figures  6  and 
7  show  the  results  of  a  breakdown  of  the  data  into  high  versus  low 
angle  deliveries.  The  figures  show  that  for  low  angle  deliveries 
only,  the  predictivity  of  the  target  acquisition  rating  (segment  4)  is 
significant.  Unfortunately,  on  this  one  segment  for  this  partitioning 
of  the  data,  there  are  too  few  observations  in  one  of  the  cells  of  the 
matrix  to  ensure  that  the  chi-squared  estimate  of  significance  is 
accurate.  However,  the  main  result  is  clear:  track  point  and  bomb 
run  segments  were  the  best  predictors  of  accuracy  for  both  kinds  of 
deliveries. 


10 


RELATIONSHIP  OF  SEGMENT  RATINGS  TO  BOMBING  ACCURACY,  WSO'S. 


SEGMENT  OF  POP-UP 
LOW  ANGLE  (10°  ANO  15°)  DELIVERIES. 


(30°)  DELIVERIES. 


There  are  several  possible  causes  of  the  main  results  of  this 
study.  An  immediate  concern  Is  that  the  pattern  is  an  artifact  of  the 
ease  with  which  different  segments  can  be  rated.  It  might  be 
supposed,  for  example,  that  track  point  and  bomb  run  performance 
ratings  are  more  accurate  and  Internally  consistent  than  are  the 
ratings  of  performance  on  other  segments  and  that  it  is  these  rating 
characteristics,  rather  than  actual  segment  performance,  which  cause 
the  observed  pattern.  However,  the  Importance  of  the  track  point  and 
bomb  run  to  overall  accuracy  appears  in  an  analysis  of  reported  errors 
as  well  as  In  the  analysis  of  ratings.  Figure  8  gives  the  proportion 
of  deliveries  that  resulted  in  misses  or  dry  or  aborted  passes  when  an 
error  was  reported  for  a  particular  part  of  the  delivery.  Since  there 
were  few  errors  reported  for  some  segments,  segments  were  pooled  in 
ways  that  preserved  a  meaningful  partition  of  the  maneuver.  The 
number  of  errors  In  each  of  the  pooled  segments  are  noted  on  the 
figure,  which  also  shows  that  errors  reported  as  occurring  during  the 
track  point  and  bomb  run  are  associated  with  an  increase  in  the 
proportion  of  bombs  off  target  as  compared  to  earlier  stages  of  the 
delivery.  This  difference  is  significant  (  2  =  10.0,  p  .0025). 

The  severity  of  track  point  and  bomb  run  errors  can  be  appreciated  by 
noting  that,  while  the  overall  probability  of  failing  to  hit  the 
target  is  0.37,  this  probability  rises  to  0.80  when  a  specific  error 
on  track  point  or  bomb  run  is  reported. 

These  data  can  also  give  us  some  idea  of  the  factors  which  account 
for  less  than  ideal  track  point  and  bomb  run  execution.  Figure  9 
shows  the  relative  frequency  of  various  kinds  of  reported  errors  in 
track  point/bomb  run.  This  distribution  of  errors  differs 
significantly  from  an  equal -frequency  distribution  (  2  =  36.8, 
p  .001).  The  tab.e  shows  the  failure  to  correct  properly  for  wind 
on  the  bomb  run  was  the  most  common  error,  accounting  for  30%  of  the 
errors  reported.  By  contrast,  wind  correction  was  mentioned  only  once 
among  the  169  errors  reported  for  the  segments  preceding  track  point 
and  bomb  run. 

The  data,  both  ratings  and  errors,  indicate  that  the  track  point 
and  bomb  run  segments  of  the  maneuver  are  the  most  predictive  of  bomb 
accuracy.  There  are  at  least  three  possible  reasons  for  this  result. 

One  possibility  is  that  the  greater  predictivity  of  the  final 
segments  is  due  In  part  to  earlier  segments.  It  is  possible  that  poor 
ratings  on  track  point  and  bomb  run  reflect  errors  made  on  earlier 
segments  as  well  as  track  point/bomb  run  errors.  This  hypothesis 
Implies  the  predictivity  of  ratings  of  track  point/bomb  run 
performance  should  drop  when  only  deliveries  In  which  high  ratings 
were  given  to  all  earlier  segments  are  considered.  Table  1  compares 
the  predictivity  of  track  point  and  bomb  run  ratings  for  all  the  data 
with  that  of  a  subset  of  the  data  which  meets  this  condition.  As  this 
table  Indicates,  the  relationship  between  track  point  and  bomb  run 
ratings  and  accuracy  does  not  decrease  for  runs  In  which  early 
performance  Is  good.  If  anything,  the  relationship  decreases  when 
earlier  segments  were  poorly  executed.  Thus,  early  errors  are  not  the 
source  of  the  predictivity  of  track  point  and  bomb  run  ratings. 


RELATIVE  FREQUENCY  OF  TRACK  POINT  AND  BOMB  RUN  ERRORS 


Table  1 


Strength  of  Relationship  to  Bomb  Accuracy  (o)  of  Track  Point  and 
Bomb  Run  Ratings  (Segments  7  &  8)  for  Different  Subsets  of  Deliveries 


Track  Point 

Bomb  Run 

All  Deliveries 

0.30 

0.40 

Deliveries  in  which 
segments  were  rated 
or  excellent. 

all  earlier 
satisfactory 

0.34 

0.44 

Deliveries  in  which 
earlier  segment  was 
or  unsatisfactory. 

at  least  one 
rated  marginal 

0.19 

0.30 

Two  other  possible  reasons  for  the  greater  predictivity  of  these 
final  segments  are  (a)  they  might  be  the  most  difficult  parts  of  the 
maneuver  to  perform,  and  (b)  independent  of  their  relative  difficulty, 
errors  made  in  the  final  few  seconds  of  the  maneuver  necessarily  leave 
less  time  available  for  corrective  action  than  do  errors  made  earlier 
and,  therefore,  may  be  the  most  likely  to  go  uncorrected  and  produce 
misses. 

There  is  no  direct  index  of  the  relative  difficulty  of  the  parts 
of  the  maneuver.  Figure  1.0  shows  the  proportion  of  marginal  and 
unsatisfactory  ratings  given  for  each  segment.  The  final  two  segments 
received  a  significantly  larger  proportion  of  marginal  and 
unsatisfactory  ratings  than  did  the  first  six  segments  (  2  =  144, 
p  .001).  If  this  proportion  is  taken  as  an  index  of  difficulty 
( Pierce  et  al.,  1979)  then  the  match  between  predictivity  and 
difficulty  is  quite  good.  However,  these  ratings  might  be  reflecting 
both  differences  in  the  ease  with  which  segments  are  performed  and 
differences  in  the  pilot's  criterion  for  acceptable  performance,  which 
in  turn  could  be  affected  by  the  knowledge  that  errors  in  the  bomb  run 
and  track  point  segments  are  difficult  to  correct  before  delivery. 

Another  possible  index  of  difficulty  is  the  number  of  reported 
errors  for  a  given  segment.  Track  point  and  bomb  run  account  for  39% 
of  all  '-eported  errors,  significantly  more  than  their  share 
(  -  -  31.5,  p  .001).  This  suggests  that  this  part  of  the 
maneuver  may  in  fact  be  the  most  difficult. 
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A  final  piece  of  relevant  information  is  the  degree  of 
independence  of  ratings  of  track  point  and  bomb  run  performance.  Do 
the  contributions  of  these  two  segments  to  bomb  accuracy  reflect  a 
common  factor  (such  as  being  near  the  end  of  the  maneuver)  or 
relatively  independent  skills?  This  question  can  be  addressed  by 
making  use  of  the  fact  that  the  index  of  association  between  two 
categorical  variables,  each  with  two  levels,  is  equivalent  to  the 
correlation  between  those  variables  when  the  categories  within  each 
variable  are  assigned  arbitrary  "scores,"  i.e.,  1  and  0  (Hays,  1973, 
p.  744).  The  particular  scores  chosen  have  no  effect  on  the 
correlation,  since  it  is  impossible  to  produce  a  nonlinear 
transformation  on  two  scores. 

Using  this  equivalence,  a  multiple  regression  analysis  can  be 
performed  with  those  variables  which  have  already  been  shown  to  have  a 
significant  association  with  bomb  accuracy  via  the  chi-squared  test. 
Table  2  presents  the  results  of  this  analysis.  The  analysis  shows 
that  the  predictive  utility  of  track  point  rating  is  almost  entirely  a 
result  of  covariation  with  bomb  run  ratings.  The  partial  correlation 
between  track  point  ratings  and  accuracy  with  bomb  run  ratings  held 
constant  is  insignificant,  and  the  addition  of  track  point  rating  to 
the  prediction  of  accuracy  increases  the  multiple  correlation  only 
negligibly.  The  results  of  this  analysis  are  consistent  with  the 
hypothesis  that  both  track  point  position  and  bomb  run  performance 
depend  on  a  common  factor. 


Table  2 

Joint  and  Partial  Prediction  of  Accuracy 
Using  Track  Point  and  Bomb  Run  Ratings 


TP  BR 

Track  Point  Rating 

(TP) 

Bomb  Run  Rating 

(BR) 

0.52* 

Bomb  Accuracy 

(A) 

0.30*  0.40* 

Multiple  R  (TP  and  BR  predicting  A)  =  0.41* 

Partial  R  (TP  predicting  A  holding  BR  constant)  =  0.11* 
Partial  R  (BR  predicting  A  holding  TP  constant)  =  0.29* 
’♦"Indi'c  ate  s’ "s  f g  n  ff  fc  ant  ~c  or  re  T  a  tT6n7~ ~  " 


CONCLUSIONS 


The  goal  of  the  research  reported  here  was  to  determine  the 
effectiveness  of  the  self-assessemnt  methodology  explored  by  Project 
SMART.  Results  of  a  large  scale  test  indicate  that  the  methodology 
was  successful  in  isolating  the  most  critical  elements  in  determining 
weapon  delivery  accuracy  for  the  pop-up  maneuver,  i.e.,  track  point 
position  and  bomb  run. 

The  primacy  of  the  track  point/bomb  run  portion  of  the  maneuver 
was  shown  both  in  the  rating  data  and  in  reported  errors  (used  to 
predict  actual  bomb  scores).  This  result  was  highly  significant  and 
was  obtained  for  both  pilots  and  WSOs  in  each  of  three  different  F-4 
wings  across  the  country.  The  data  also  allowed  the  relative 
frequencies  of  occurrence  of  the  major  reported  track  point/bomb  run 
errors  to  be  categorized  and  compared. 

It  should  be  emphasized  that  these  results  apply  only  to 
experienced  pilots  on  familiar  training  ranges,  where  navigation 
problems  are  minimal.  It  remains  to  be  seen  whether  they  are 
sustained  under  "high-threat,"  novel  environments.  Also,  data  are 
required  from  a  wider  range  of  tactical  tasks,  such  as  air  combat 
maneuvers,  to  determine  how  skill  on  the  pop-up  maneuver  fits  into  the 
overall  proficiency  of  the  mi s? ion-ready  pilot. 

It  is  clear  that  pilot  self-assessment  can  be  a  useful  source  of 
data  in  identifying  critical  aircrew  skills.  The  data  obtained  here 
suggest  that  pilots,  when  appropriately  briefed  to  reduce  extraneous 
demand  characteristics  of  the  self-assessment  task,  will  make  diligent 
attempts  at  reporting  their  performance  accurately.  And,  the 
consistent  trend  across  both  ratings  and  error  data  suggests  that 
pilots  do  reliably  discriminate  among  individual  segment  performances 
of  the  pop-up  maneuver. 

The  present  results,  when  combined  with  the  conclusions  of  an 
earlier  examination  of  student  weapon  delivery  skills  using  a  similar 
methodology  (Pierce  et  al.,  1979),  provide  a  reasonable  picture  of  the 
parts  of  the  maneuver  which  contribute  most  to  overall  skill  at 
different  stages  of  mastery.  Pierce  et  al .  concluded  that,  with 
practice,  F-4  students  improved  their  execution  of  the  i_ni_tial_ 
segments  of  the  maneuver  most.  The  final  track  point  and  bomb  run 
segments,  however,  remained  the  most  difficult  parts  throughout 
training.  The  data  suggest  that  as  aircrews  become  accurate  enough  to 
qualify  as  mission  ready,  they  can  execute  the  initial  parts  of  the 
maneuver  well  enough  so  that,  errors  in  those  segments  do  not  greatly 
affect  bomb  scores.  For  these  experienced  pilots,  most  of  the  value 
of  pop-up  training  drills  consists  of  the  opportunity  to  practice  the 
critical  final  seconds  of  the  maneuver.  Therefore,  it  is  possible 
that  a  device  which  simulates  the  rapid  correction  of  position  and 
aiming  errors  required  in  cne  final  run  could  be  of  some  benefit  for 
continuation  training. 
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The  pilot  self-assessment  approach  yielded  a  stable  and  important 
result  in  this  case;  however,  some  of  the  advantages  that  might  be 
expected  from  self-evaluation  techniques  were  not  evident.  For 
example,  it  was  mentioned  earlier  that  getting  subjective  assessments 
ought  to  be  a  relatively  economical  way  to  collect  data  since  only 
minimal  equipment  is  required.  However,  if  a  researcher  must  be 
present  after  each  flight  to  collect  the  data,  the  expense  is  still 
considerable.  During  this  test  of  the  methodology,  some  squadrons 
were  designated  to  receive  full-time  researcher  coverage,  while  others 
were  briefed  on  the  data  collection  method  and  were  asked  to  collect 
data  on  their  own  for  half  or  all  of  the  one  month  test  period.  The 
general  result  is  that  about  three  times  the  number  of  data  forms  were 
received  when  a  researcher  was  present  as  compared  to  voluntary 
participation  by  the  squadrons. 

A  related  problem  is  that,  even  though  the  data  forms  used  could 
be  filled  out  in  a  minute  or  so,  some  pilots  appeared  to  resent  this 
addition  to  their  paperwork  load.  This  might  be  a  problem  in  using 
the  forms  on  a  day  to  day  basis;  an  even  less  obtrusive  method  would 
be  desirable. 

To  conclude,  the  present  analysis  demonstrates  that  useful 
inferences  about  the  components  of  task  performance  can  be  made  from 
pilot  self-assessment  data;  however,  self-assessment  may  not  be  the 
most  feasible  method  when  reasonably  economical  and  unobtrusive  direct 
measures  of  performance  are  available. 
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APPENDIX  A 


MISS  DISTANCE  CRITERIA  FOR  CATEGORIZING  BOMB  SCORES 
FOR  DIFFERENT  EVENTS  (METERS) 


Category 

- event- 

low  Angle 
Bomb 

Low  Angle 
Low  Drag 

Dive 

Bomb 

Dive 

Toss 

Bull  _ 

4.6  m 

4.6  m 

4.6  m 

4.6  m 

Hit  _ 

32  m 

53  m 

44  m 

50  m 
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